{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Tce3stUlHN0L"
      },
      "source": [
        "##### Copyright 2024 Google LLC."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "cellView": "form",
        "id": "tuOe1ymfHZPu"
      },
      "outputs": [],
      "source": [
        "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "# https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "084u8u0DpBlo"
      },
      "source": [
        "# Prompting with media files"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ZFWzQEqNosrS"
      },
      "source": [
        "<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://ai.google.dev/gemini-api/docs/prompting_with_media\"><img src=\"https://ai.google.dev/static/site-assets/images/docs/notebook-site-button.png\" height=\"32\" width=\"32\" />View on ai.google.dev</a>\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/google/generative-ai-docs/blob/main/site/en/gemini-api/docs/prompting_with_media.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
        "  </td>\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://github.com/google/generative-ai-docs/blob/main/site/en/gemini-api/docs/prompting_with_media.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n",
        "  </td>\n",
        "</table>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "3c5e92a74e64"
      },
      "source": [
        "The Gemini API supports *multimodal* prompting with text, image, and audio data. For small files, you can point the Gemini model\n",
        "directly to a local file when providing a prompt. Upload larger files with the\n",
        "[File API](https://ai.google.dev/api/rest/v1beta/files) before including them in\n",
        "prompts.\n",
        "\n",
        "The File API lets you store up to 20GB of files per project, with each file not\n",
        "exceeding 2GB in size. Files are stored for 48 hours and can be accessed with\n",
        "your API key for generation within that time period and cannot be downloaded from the API. It is available at no cost in all regions where the [Gemini API is\n",
        "available](https://ai.google.dev/available_regions).\n",
        "\n",
        "The File API handles inputs that can be used to generate content with [`model.generateContent`](https://ai.google.dev/api/rest/v1/models/generateContent) or [`model.streamGenerateContent`](https://ai.google.dev/api/rest/v1/models/streamGenerateContent). For information on valid file formats (MIME types) and supported models, see [Supported file formats](#supported_file_formats).\n",
        "\n",
        "This guide shows how to use the File API to upload media files and include them in a `GenerateContent` call to the Gemini API. For more information, see the [code\n",
        "samples](https://github.com/google-gemini/gemini-api-cookbook/tree/main/quickstarts/file-api).\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "VxCstRHvpX0r"
      },
      "source": [
        "## Setup\n",
        "\n",
        "Before you use the File API, you need to install the Gemini API SDK package and configure an API key. This section describes how to complete these setup steps."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "G6J_rV2ipmj_"
      },
      "source": [
        "### Install the Python SDK and import packages\n",
        "\n",
        "The Python SDK for the Gemini API is contained in the [google-generativeai](https://pypi.org/project/google-generativeai/) package. Install the dependency using pip."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "mN8x8DPgu9Kv"
      },
      "outputs": [],
      "source": [
        "!pip install -q -U google-generativeai"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "NInUZ4hwDq6d"
      },
      "source": [
        "Import the necessary packages."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "0x3xmmWrDtEH"
      },
      "outputs": [],
      "source": [
        "import google.generativeai as genai\n",
        "from IPython.display import Markdown"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "l8g4hTRotheH"
      },
      "source": [
        "### Setup your API key\n",
        "\n",
        "The File API uses API keys for authentication and access. Uploaded files are associated with the project linked to the API key. Unlike other Gemini APIs that use API keys, your API key also grants access to data you've uploaded to the File API, so take extra care in keeping your API key secure. For more on keeping your keys\n",
        "secure, see [Best practices for using API\n",
        "keys](https://support.google.com/googleapi/answer/6310037).\n",
        "\n",
        "Store your API key in a Colab Secret named `GOOGLE_API_KEY`. If you don't already have an API key, or are unfamiliar with Colab Secrets, refer to the [Authentication quickstart](https://github.com/google-gemini/gemini-api-cookbook/blob/main/quickstarts/Authentication.ipynb)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "d6lYXRcjthKV"
      },
      "outputs": [],
      "source": [
        "from google.colab import userdata\n",
        "GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')\n",
        "\n",
        "genai.configure(api_key=GOOGLE_API_KEY)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "c-z4zsCUlaru"
      },
      "source": [
        "## Prompting with images\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "rsdNkDszLBmQ"
      },
      "source": [
        "### Upload an image file to the File API"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "o1K81yn9mFBo"
      },
      "source": [
        "In this tutorial, you upload a sample image to the API and use it to generate content.\n",
        "\n",
        "Refer to the [Appendix section](#uploading_files_to_colab) to learn how to upload your own file."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "lC6sS6DnmGmi"
      },
      "outputs": [],
      "source": [
        "!curl -o image.jpg https://storage.googleapis.com/generativeai-downloads/images/jetpack.jpg"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "N9NxXGZKKusG"
      },
      "outputs": [],
      "source": [
        "# Upload the file\n",
        "sample_file = genai.upload_file(path=\"image.jpg\",\n",
        "                            display_name=\"Sample drawing\")\n",
        "\n",
        "print(f\"Uploaded file '{sample_file.display_name}' as: {sample_file.uri}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "cto22vhKOvGQ"
      },
      "source": [
        "The `response` shows that the File API stored the specified `display_name` for the uploaded file and a `uri` to reference the file in Gemini API calls. Use `response` to track how uploaded files are mapped to URIs.\n",
        "\n",
        "Depending on your use case, you can also store the URIs in structures such as a `dict` or a database."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ds5iJlaembWe"
      },
      "source": [
        "### Get file\n",
        "\n",
        "After uploading the file, verify that the File API has successfully received it by calling `files.get`.\n",
        "\n",
        "`files.get` lets you get the file metadata that have been uploaded to the File API that are associated with the Cloud project your API key belongs to. Only the `name` (and by extension, the `uri`) are unique. Only use the `displayName` to identify files if you manage uniqueness yourself."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "kLFsVLFHOWSV"
      },
      "outputs": [],
      "source": [
        "file = genai.get_file(name=sample_file.name)\n",
        "print(f\"Retrieved file '{file.display_name}' as: {sample_file.uri}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "EPPOECHzsIGJ"
      },
      "source": [
        "### Generate content\n",
        "\n",
        "After uploading the file, you can make `GenerateContent` requests that reference the File API URI. In this example, you create a prompt that starts with a text followed by the uploaded image."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "ZYVFqmLkl5nE"
      },
      "outputs": [],
      "source": [
        "# Set the model to Gemini 1.5 Pro.\n",
        "model = genai.GenerativeModel(model_name=\"models/gemini-1.5-pro-latest\")\n",
        "\n",
        "response = model.generate_content([\"Describe the image with a creative description.\", sample_file])\n",
        "\n",
        "Markdown(\">\" + response.text)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "IrPDYdQSKTg4"
      },
      "source": [
        "### Delete files\n",
        "\n",
        "Files are automatically deleted after 2 days. You can also manually delete them using `files.delete()`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "d4eO8ZXoKdZf"
      },
      "outputs": [],
      "source": [
        "genai.delete_file(sample_file.name)\n",
        "print(f'Deleted {sample_file.display_name}.')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "TaUZc1mvLkHY"
      },
      "source": [
        "## Prompting with videos"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "MNvhBdoDFnTC"
      },
      "source": [
        "### Upload a video file to the File API\n",
        "\n",
        "The File API accepts video file formats directly. This example uses the short film \"Big Buck Bunny\".\n",
        "\n",
        "> \"Big Buck Bunny\" is (c) copyright 2008, Blender Foundation / www.bigbuckbunny.org and [licensed](https://peach.blender.org/about/) under the [Creative Commons Attribution 3.0](http://creativecommons.org/licenses/by/3.0/) License.\n",
        "\n",
        "Refer to the [Appendix section](#uploading_files_to_colab) to learn how to upload your own file."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "V4XeFdX1rxaE"
      },
      "outputs": [],
      "source": [
        "!wget https://download.blender.org/peach/bigbuckbunny_movies/BigBuckBunny_320x180.mp4"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "_HzrDdp2Q1Cu"
      },
      "outputs": [],
      "source": [
        "video_file_name = \"BigBuckBunny_320x180.mp4\"\n",
        "\n",
        "print(f\"Uploading file...\")\n",
        "video_file = genai.upload_file(path=video_file_name)\n",
        "print(f\"Completed upload: {video_file.uri}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "oOZmTUb4FWOa"
      },
      "source": [
        "### Get file\n",
        "\n",
        "Verify the API has successfully received the files by calling the `files.get` method.\n",
        "\n",
        "NOTE: Video files have a `State` field in the File API. When a video is uploaded, it will be in `PROCESSING` state until it is ready for inference. Only `ACTIVE` files can be used for model inference."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "SHMVCWHkFhJW"
      },
      "outputs": [],
      "source": [
        "import time\n",
        "\n",
        "while video_file.state.name == \"PROCESSING\":\n",
        "    print('.', end='')\n",
        "    time.sleep(10)\n",
        "    video_file = genai.get_file(video_file.name)\n",
        "\n",
        "if video_file.state.name == \"FAILED\":\n",
        "  raise ValueError(video_file.state.name)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "zS5NmQeXLqeS"
      },
      "source": [
        "### Generate content\n",
        "\n",
        "After the video has been uploaded, you can make `GenerateContent` requests that reference the File API URI."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "ypZuGQ-2LqeS"
      },
      "outputs": [],
      "source": [
        "# Create the prompt.\n",
        "prompt = \"Describe this video.\"\n",
        "\n",
        "# Set the model to Gemini 1.5 Pro.\n",
        "model = genai.GenerativeModel(model_name=\"models/gemini-1.5-pro-latest\")\n",
        "\n",
        "# Make the LLM request.\n",
        "print(\"Making LLM inference request...\")\n",
        "response = model.generate_content([prompt, video_file],\n",
        "                                  request_options={\"timeout\": 600})\n",
        "print(response.text)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "diCy9BgjLqeS"
      },
      "source": [
        "### Delete files\n",
        "\n",
        "Files are automatically deleted after 2 days or you can manually delete them using `files.delete()`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "YYyi5PrKLqeb"
      },
      "outputs": [],
      "source": [
        "genai.delete_file(video_file.name)\n",
        "print(f'Deleted file {video_file.uri}')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "OxgouC6i7exO"
      },
      "source": [
        "## Supported file formats\n",
        "\n",
        "Gemini models support prompting with multiple file formats. This section explains considerations in using general media formats for prompting, specifically image, audio, video, and plain text files. You can use media files for prompting only with specific model versions, as shown in the following table.\n",
        "\n",
        "<table>\n",
        "  <thead>\n",
        "    <tr>\n",
        "      <th><strong>Model</strong></th>\n",
        "      <th><strong>Images</strong></th>\n",
        "      <th><strong>Audio</strong></th>\n",
        "      <th><strong>Video</strong></th>\n",
        "      <th><strong>Plain text</strong></th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <td>Gemini 1.5 Pro (release 008 and later)</td>\n",
        "      <td>✔ (3600 max image files)</td>\n",
        "      <td>✔</td>\n",
        "      <td>✔</td>\n",
        "      <td>✔</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <td>Gemini Pro Vision</td>\n",
        "      <td>✔ (16 max image files)</td>\n",
        "      <td></td>\n",
        "      <td></td>\n",
        "      <td>✔</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "\n",
        "### Image formats\n",
        "\n",
        "You can use image data for prompting Gemini models. When you use images for prompting, they are subject to the following limitations and requirements:\n",
        "\n",
        "-   Images must be in one of the following image data [MIME types](https://developers.google.com/drive/api/guides/ref-export-formats):\n",
        "    -   PNG - image/png\n",
        "    -   JPEG - image/jpeg\n",
        "    -   WEBP - image/webp\n",
        "    -   HEIC - image/heic\n",
        "    -   HEIF - image/heif\n",
        "-   Maximum of 3600 images for `gemini-1.5-pro`\n",
        "-   No specific limits to the number of pixels in an image; however, larger images are scaled down to fit a maximum resolution of 3072 x 3072 while preserving their original aspect ratio.\n",
        "\n",
        "### Audio formats\n",
        "\n",
        "You can use audio data for prompting with the `gemini-1.5-pro` model. When you use audio for prompting, they are subject to the following limitations and requirements:\n",
        "\n",
        "-   Audio data is supported in the following common audio format [MIME types](https://developers.google.com/drive/api/guides/ref-export-formats):\n",
        "    -   WAV - audio/wav\n",
        "    -   MP3 - audio/mp3\n",
        "    -   AIFF - audio/aiff\n",
        "    -   AAC - audio/aac\n",
        "    -   OGG Vorbis - audio/ogg\n",
        "    -   FLAC - audio/flac\n",
        "-   The maximum supported length of audio data in a single prompt is 9.5 hours.\n",
        "-   Audio files are resampled down to a 16 Kbps data resolution, and multiple channels of audio are combined into a single channel.\n",
        "-   There is no specific limit to the number of audio files in a single prompt, however the total combined length of all audio files in a single prompt cannot exceed 9.5 hours.\n",
        "\n",
        "### Video formats\n",
        "\n",
        "You can use video data for prompting with the `gemini-1.5-pro` model.\n",
        "\n",
        "- Video data is supported in the following common video format [MIME types](https://developers.google.com/drive/api/guides/ref-export-formats):\n",
        "  -   video/mp4\n",
        "  -   video/mpeg\n",
        "  -   video/mov\n",
        "  -   video/avi\n",
        "  -   video/x-flv\n",
        "  -   video/mpg\n",
        "  -   video/webm\n",
        "  -   video/wmv\n",
        "  -   video/3gpp\n",
        "\n",
        "- The File API service currently samples videos into images at 1 frame per second (FPS) and may be subject to change to provide the best inference quality.\n",
        "- Video limits depend on the context size limit of the model used for inference. For example, `gemini-1.5-pro` has a maximum video length of ~60 minutes.\n",
        "\n",
        "### Plain text formats\n",
        "\n",
        "The File API supports uploading plain text files with the following [MIME types](https://developers.google.com/drive/api/guides/ref-export-formats):\n",
        "-   text/plain\n",
        "-   text/html\n",
        "-   text/css\n",
        "-   text/javascript\n",
        "-   application/x-javascript\n",
        "-   text/x-typescript\n",
        "-   application/x-typescript\n",
        "-   text/csv\n",
        "-   text/markdown\n",
        "-   text/x-python\n",
        "-   application/x-python-code\n",
        "-   application/json\n",
        "-   text/xml\n",
        "-   application/rtf\n",
        "-   text/rtf\n",
        "\n",
        "For plain text files with a MIME type not on the list, you can try specifying\n",
        "one of the above MIME types manually."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "rIoNRWn0nwUy"
      },
      "source": [
        "## Appendix: Uploading files to Colab\n",
        "<a name=\"uploading_files_to_colab\"></a>\n",
        "This notebook uses the File API with files that were downloaded from the internet. If you're running this in Colab and want to use your own files, you first need to upload them to the colab instance.\n",
        "\n",
        "First, click **Files** on the left sidebar, then click the **Upload** button:\n",
        "\n",
        "<img width=400 src=\"https://ai.google.dev/tutorials/images/colab_upload.png\">\n",
        "\n",
        "Next, you'll upload that file to the File API. In the form for the code cell below, enter the filename for the file you uploaded and provide an appropriate display name for the file, then run the cell.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "VqAwyEa3nxaZ"
      },
      "outputs": [],
      "source": [
        "my_filename = \"image.jpg\" # @param {type:\"string\"}\n",
        "my_file_display_name = \"Sample Image\" # @param {type:\"string\"}\n",
        "\n",
        "my_file = genai.upload_file(path=my_filename,\n",
        "                            display_name=my_file_display_name)\n",
        "print(f\"Uploaded file '{my_file.display_name}' as: {my_file.uri}\")"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "name": "prompting_with_media.ipynb",
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
